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ABSTRACT 



In a step toward creating a robust na tural language 
understanding system which detects and avoids miscommunication, this 
artificial intelligence research report provides a taxonomy of 
miscommunication problems that arise in expert-apprentice dialogues 
(including misunderstandings, wrong communication , and bad 
analogies), and proposes a flexible extension of the succeed/fail 
paradigm to handle reference mistakes. Extended examples of these 
reference failures are provided. A theory of relaxation (similar to 
human referent identification processes) for recovering from 
reference failures is then developed using the representational 
language _ "KL-One^ "which represents real-world objects hierarchically 
(in the ; tradition of semantic networks arid frames). Rule-based 
relaxation is described as an integral part of the process whereby 
people who are asked to identify objects behave in a particular way: 
by finding candidates , re-trying, and if necessary, giving up and 
asking for _ help. The study models the relaxation process and provides 
a computational model for experimenting with the different parameters 
of the relaxation process. Extensive examples show how the model 
handles problems with imprecision and over-specification in a 
speaker's description using rules in a hierarchical knowledge base. 
Finally, the relaxation model is shown to be preferable to "closest 
match" models, which fail because looking for the closest match 
cannot determine the most salient features of a possible referent. 
The report indicates that the relaxation model avoids this problem by 
determining salient features using the kinds of knowledge humans have 
about language and the physical world. Many computer-language 
examples and diagrams supplement the discussion. A 6-paqe reference 
list is appended. (SKC) 
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Rule-Based Relaxation - 

Abstract 

The goal of this work is the enrichment of human- machine 
interactions in a natural language environment. 1 We want to 
provide a framework less restrictive than earlier ones by 
allowing a speaker leeway in forming an utterance about a task 
and in determining the conversational veaicle to deliver it. A 
speaker and listener cannot be assumed to have the same beliefs, 
contexts, perceptions, backgrounds or goals at each point in a 
conversation. As a result, difficulties and mistakes arise when 
a listener interprets a speaker's utterance. These mistakes can 
lead to various kinds of misunderstandings between speaker and 
listener, including reference failures or failure to understand 
the speaker's intention. We call these misunderstandings 
miscommunication. Such mistakes can slow down and possibly break 
down communication. Our goal is to recognize and isolate such 
miscommunications and circumvent them. This paper will highlight 
a particular class of miscommunication- -reference problems- -by 
describing a case study and techniques for avoiding failures of 
reference. 
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1 . Introduction 
Cohen, Perrault and Allen argued in their paper "Beyond 
Question Answering" (1981) that \ . . users of question- 
answering systems expect them to do more than just answer 
isolated que t -ions they expect systems to engage in 
conversation. In doing so, the system is expected to allow users 
to ba less than meticulously literal in conveying their 
intentions, and it is expected to make linguistic and pragmatic 
use of the previous discourse." Following in their footsteps, we 
want to build robust natural language processing systems that can 
detect and recover from miscommunication. The development of 
such systems requires a study on how people communicate and how 
they recover from miscommunication. This paper summarizes the 
results of a dissertation (Goodman, 1984) that investigates the 
kinds of miscommunication that occur in human communication with 
a special emphasis on reference problem s, i.e., problems a 
listener has determining whom or what a speaker is talking about. 
We have written computer programs and algorithms that demonstrate 
how one could solve such problems in a natural language 
understanding system. The study of miscommunication is a 
necessary task for natural language understanding systems since 
any computer capable of communicating with humans in natural 
language must be tolerant of the complex, imprecise, or ill- 
devised utterances that people often use. 

Our current research (Sidner, Bates, Bobrow, Brachman, 
Cohen, Israel, Schmolze, Webber, & Woods, 1981; Sidner, Bates, 
Bobrow, Goodman, Haas, Ingria, Israel, McAllester, Moser, 
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Schmolze, & Vilain, 1983) views most dialogues ts being cooperative 
and goal -directed, i.e., a speaker and listener work together to 
achieve a common goal. The interpretation of an utterance 
involves identifying the underlying plan or goal that the 
utterance reflects (Cohen, 1978; Allen, 1979; Sidner & Israel, 
1981; and Sidner, 1985). This plan, however, is rarely, if ever, 
obvious at the surface sentence level. A central issue is to 
transform sequences of complex, imprecise, or ill-devised 
utterances into well-specified plans that might be carried out by 
dialogue participants. Within this context, miscommunication can 
occur . 

We are particularly concerned with cases of miscommunication 
from the hearer's viewpoint, such as when the hearer is 
inattentive to, confused about, or misled about the intentions of 
the speaker. In ordinary exchanges, speakers usually make 
assumptions regarding what their listeners know about a topic of 
discussion. They will leave out details thought to be 
superfluous (Appelt, 1981; McKeown, 1983). Since the speaker 
really does not know exactly what a listener knows about a topic, 
it is easy to make statements that can be misinterpreted or not 
understood by the listener because not enough details were 
presented. One principal source of trouble is the descriptions 
constructed by the speaker to refer to actual objects in the 
world. A description can be imprecise, confusea, ambiguous or 
overly specific, or might be interpreted in the wrong context, as 
a result, the lirten'Sr cannot determine what object is being 
described (we will call these errors "misreference"\ The 
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descriptions, which cause reference identification failure, are 
"ill -formed." The blame for ill- formedness may lie partly with 
the speaker arid partly with the listener. The speaker may h*ve 
been sloppy or not taken the hearer Into consideration; the 
listener may be either remiss or unwilling to admit he can't 
understand the speaker and to ask the speaker for clarification, 
or may simply believe that he has understood when he, in fact, has 
not . 

This work provides a new way to look at reference that 
involves a more active, introspective approach to repairing 
communication. It redefines the notion of finding a referent 
since the previous paradigms proved inappropriate in the real 
world, given the data we've analyzed. We introduce a new process 
called "negotiation" that is used when reference fails, and we 
illustrate this by introducing a new computational model called 
FWIM, for "Find What I Mean." We develop a theory called 
extensional reference miscommunication that will help explain how 
people successfully use imperfect descriptions. 

The last part of this section provides an introduction to 
the work and the methodology used. Section 2 of this paper 
highlights some aspects of normal communication and then provides 
a general discussion on the types of miscommunication that occur 
in conversation, concentrating primarily on reference problems 
and illustrating them with examples. Section 3 presents initial 
solutions to some of the problems of miscommunication. 
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1 • 1 The Domain and Methodology 

We are following the task-oriented paradigm of ^rosz (1977) 
since it is easy to study (through videotapes), it places the 
world in front of you (a primarily extensional world), and it 
limits the discussion while still providing a rich environment 
for complex descriptions. The task chosen as the target tor the 
system is the assembly of a toy water pump. The water pump is 
reasonably complex, containing four subassemblies that are built 
from plastic tubes, nozzles, valves, plungers, and caps that can 
be screwed or pushed together. A large corpus of dialogues 
concerning this task was collected by Cohen (1981, 1984; Cohen, 
Fertig, & Starr, 1982). These dialogues contained instructions 
from an "expert" to an "apprentice" that explain the assembly of 
the toy water pump. Both participants were working to achieve a 
common goal --the successful assembly of the pump. This domain is 
rich in perceptual information, allowing for complex descriptions 
of elements in it. The data provide examples of imprecision, 
confusion, and ambiguity, as well as attempts to correct these 
problems . 

In the following exchange, A is instrucing J tc assemble 
part of the water pump. Refer to Figure 1(a) for a picture of 
the pump. A and J are communicating verbally, but neither can see 
the other. (The bracketed text in the excerpt tells what was 
actually occurring while each utterance was spoken.) Notice the 
complexity of the speaker's descriptions and the resultant 
processing required by the listener. This dialogue illustrates 
that (1) listeners repair the speaker's description in order to 
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find a referent, (2) they repair their initial reference choice 
once they are given more information, and (3) they can fail to 
choose a proper referent. In Line 7, A describes the two holes 
on the BASEVALVE as "the little hole." J, realizing that A 
doesn't really meat, "one" hole but "two, : ' must repair the 
description. J does, since he doesn't complain about A's 
description, and correctly attaches the BASEVALVE to the 
TUBEBASE. Figure i(b) shows the pump after the TUBEBASE is 
attached to the MAINTUBE in Line 10. In Line 13, J interprets "a 
red plastic piece" to refer to the NOZZLE. When A adds the 
relative clause "that has four gizmos on it," J is forced to drop 
the NOZZLE as the referent and to select the SLIDEVALVE. In 
Lines 17 and 18, A's description "the other- -the open part of the 
main tube, the lower valve" is ambiguous, and J selects the wrong 
site, ramely the TUBEBASE, in which to insert the SLIDEVALVE. 
Since the SLIDEVALVE fits, J doesn't detect any trouble. Lines 
20 and 21 keep J from thinking that, something is wrong because 
the part fits loosely. In Lines 27 and 28, J indicates that A 
has not given him enough information to perform the requested 
action. In Line 30, J further compounds the error in Line 18 by 
putting the SPOUT on the TUBEBASE. 

Excerpt 1 (Telephone; 

A: 1. Now there *s a blue cap 

[J grabs the TUBEBASE] 

2. that has two little teeth sticking 

3. out of the bottom of it. 

J: 4. Yeah. 

A: 5. Okay. On that take the 

6. bright shocking pink piece of plastic 

[J takes BASEVALVE] 
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7. and stick the little hole over the teeth: 

[J starts _to install the BASEVALVE, 
backs off, looks at it 
again and then goes ahead 
and installs it) 

J : 8 . Okay . 

A: 9. New screw that blue cap onto 
10. the bottom of the main tube. 

[J Screws TUBEBASE onto MAINTUBE] 

J: 11. Okay. 

A: 12. Now, there 1 s a-- 

13. a red plastic piece 

[J starts for NOZZLE] 

14. that has four gizmos on it. 

[J switches to SLIDEVALVE] 

J: 15. Yes. 



A: 16. Okay. Put the ungizmoed end in the uh 

17. the other --the open 

18. part of the main tube, the lower valve. 

[J puts SLIDEVALVE into hole in 
TUBEBASE, but A meant 
0UTLET2 of MAINTUBE] 

J: 19. All right. 

A: 20. It just fits loosely. It doesn't 

21. have to fit right. Okay, then take 

22. the clear plastic elbow joint. 

[J takes SPOUT] 

J: 23. All right. 

A: 24. And put it over the bottom opening, too. 

[J tries installing SPOUT on 
TUB F BASF] 

J: 25. Okay. 

A: 26. Okay. Nov, take the-- 

•J : 27.. Which end am I supposed to put it over? 
28. Do you know? 
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A: 29: Put the- -put the--the big end-- 
30. the big end over it. 

[J pushes big end of SPOUT on 

TtTBEBASE r listing it 
to force it on] 

The example illustrates; the complexity of reference 
identification in a task-oriented domain. It shows that people 
do net always give up when a speaker's description isn't perfect 
but that they try to plow ahead anyway. The rest of this paper 
will formalize the kinds of problems that occur during reference 
and uuen extend the reference paradigm to get around many of 
them. 
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Figure 1: The Toy Water D ump 
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2 . Miscommunication 
People must and do manage to resolve lots of (potential) 
miscommunication in everyday conversation. Much of it seems to 
be resolved subconsciously- -with the listener unconcerned that 
anything is wrong. Other miscommunication is resolved with the 
listener actively deleting or replacing information in the 
speaker's utterance until it fits the current context. Sometimes 
this resolution is postponed until the questionable part of the 
utterance is actually needed. Still, when all these fail, the 
listener can ask the speaker to clarify what was said. 2 

There are many aspects of an utterance that can confuse the 
listener and lead to miscommunication. The listener can become 
confused about what the speaker intends the objects, the 

actions, and the goals described by the utterance. Confusions 
often appear -.o result from conflict between the current state of 
the conversation, the overall goal of the speaker, and the manner 
in which the speaker presented the information. However, when 
the listener steps back and i - able to discover what kind of 
confusion is occurring, then that cm be resolved. 
2 . 1 Causes of Miscommunication 

Task-oriented conversations have a specific goal to be 
achieved: the performance of a task (e.g., the air compressor 
assembly in Gross (1977)). The participants in the dialogue can 
have the same skill lovel, and they can work together to 
accomplish the task; or one of them, the expert, could know more 
and direct the other, the apprentice, to perfrrm the tas':. We 
have concentrated primarily or the latter case- -due to the 
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protocols that we examined- -but many of our observations can be 
generalized to the former case, too. 

The viewpoints of the expert and apprentice differ greatly 
in exchanges. The expert, understanding the functionality of the 
elements in the task, has more of a feel for how they work and go 
together, and how they can be used. The apprentice normally has 
no such knowledge and must base his decisions on his perceptions 
such as shape (Grosz, 1981). 

The structure of the task affects the structure of the 
dialogue (Grosz, 1977), as the expert and apprentice accomplish 
each sucp of the task. The common center of attention of the 
dialogue participants is called the focus (Grosz, 1977; Reichman, 
1978; and Sidner, 1979). Shifts in focus correspond to shifts 
between the tasks and subtasks. Focus and focus shifts are 
governed by many rules (Grosz, 1977; Reichman, 1978: and Sidner, 
1979). Confusion may result when expected shifts do not take 
place. For example, if the expert changes focus to some object 
but does not talk about: the object soon after its introduction 
(i.e., before it is used), without digressing in a well-structured 
way (see Reichman, 1978), or never discusses its subpieces 
(such cis an obvious attachment surface), then the apprentice may 
become confused, leaving him ripe for miscommunication. The 
reverse influence between focus and objects can lead to trouble, 
too. A shift in focus by the expert the t does not have a 
manifestation itself to the apprentice's world will also perplex 
him. 
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Focus also influences descriptions (Grosz, 1981; Appeit, 
1981). The level of detail required in a description depends 
directly on the elements currently in focus. If the object to be 
described is similar to other elements in focus, the expert must 
be more specific in formulating the description or may consider 
shifting focus away from the confusing objects. 

2.1.2 Discrepancies in knowledge a^nd m4s^ommunicatibn . 
Just as with discrepancies in focus, discrepancies in knowledge 
between the speaker and listener can cause miscommunication. 
These disagreements can occur because the listener does not bring 
sufficient knowledge and the speaker fails to convey enough 
information to give him the knowledge sufficient to perform the 
task (that knowledge becomes shared or mutually believed 
knowledge (Clark & Marshall, 1981; Perrault & Cohen, 1981; Joshi, 
1982; Nadathur 5 Joshi, 1983). The speaker and listener could 
also have different beliefs. For example, they could differ on 
what each believes about the other, which can lead to false 
assumptions that each may use when interpreting the other 1 s 
utterances. Knowledge differences, though, can sometimes provide 
a means to help detect miscommunication. For example, a 
listeners knowledge about the world in which the task is taking 
place can provide a way of checking whether or not a speaker's 
utterance is realistic. 

Knowledge the listener brings to the task . In apprentice - 
expert dialogues such as those about the water pump, the 
knowledge brought to the task by a naive apprentice is limited to 
four principal areas: (i) language abilities, (2) perceptual 
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abilities to identify objects, (3) past experience and knowledge 
in assembling objects^ and (4) the ability to perform trial-and- 
error tests in the real world. The language abilities of the 
apprentice allow him to follow the flow of information provided 
by the expert in his utterances and descriptions. This knowledge 
about language is syntactic, semantic and pragmatic. 

Perceptual abilities include recognizing physical features 
of an object such as its size, shape, color, location, 
composition and transparency. The fineness of each category's 
partitioning varies among individuals. For example, some people 
know more color values than others. An expert, if he wishes to 
prevent misref erence, may choose to use only basic level 
descriptions in each category until the apprentice demonstrates a 
broader knowledge, or the expert can familiarize the apprentice 
with other values. 

The past experience someone has with objects provides a 
method for the expert to tie a description down to a common point 
of view. If an object has a familiar name, the expert can refer 
to it by that name. The expert can also refer by making 
analogies to everyday objects through shapes or functions as a 
model for the apprentice in his selection of a referent. The 
same holds true for actions- -past experience makes it easier for 
the expert to describe an action to the apprentice. 

Finally, the apprentice brings to a task the ability to 
perform simple tests. He can experiment to determine whether two 
pieces can be attached. In the water pump domain, attachment is 
performed by pushing, twisting or screwing one object into or 
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onto another. How good a fit is can be determined by noting the 
compatibility of the shapes of the attaching surfaces (arid this 
can be used to align the surfaces) and by checking the snugness 
of the fit once the objects are attached. 

The knowledge transferred in an utterance . At least two 
kinds of knowledge are conveyed in an utterance. For this paper 
we will focus on task knowledge and communicative knowledge. 
Task knowledge about the specific domain is used to fill the 
propositionai content of an utterance. In the water pump domain 
it refers to: (1) the objects, the set of parts available to 
accomplish the task (i.e., the "real world" which is the physical 
environment around the conversational participants); (2) the 
actions, the set of physical actions available to the listener; 
arid (3) instructions linking objects and actions together to 
achieve some goal. 

Communicative knowledge consists of speech acts, 
communicative goals, and communicative actions. Speech acts are 
underlying forms that are performed by the speaker in expressing 
an utterance (e.g., REQUEST, INFORM) (Searle, 1969; Cohen, 1978; 
and Allen, 1979). They provide an illocutionary force that is 
applied to the proposition expressed. Communicative goals 
reflect the structure of the discourse (e.g., setting up a topic , 
clarifying, or adding more information (Allen, Frisch, & Litman, 
1982)). They express how ari utterance is to be understood with 
respect to the high-level communicative goals reflected in the 
structure of the dialogue and, hence, how the task the utterance 
examines is performed. A communicative act is a way of 
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accomplishing the goal that one wants to (e.g., communicate the 
goal, communicate the object's description, communicate the 
action). Only some of the possible acts may be reasonable at any 
one time to reach the current communicative goal (Reichman, 1981; 
Allen, Frisch, & Litman, 1982; Litman, 1983). 

Miscommunication can occur because of the way the 
information was transferred (e.g., communicative knowledge) or 
the content (e.g., task knowledge). Task knowledge -based 
miscommunication occurs when the speaker is unaware that the 
listener (1) has a different view of the task, (2) is considering 
a different subset of objects, or (3) is considering a different 
subset of actions, and so on. Difficulties with communicative 
knowledge can occur when the speaker uses the wrong speech act 
(e.g., utters something inadvertently that would be 
conventionally interpreted as an INFORM when meant as a REQUEST) 
or when the listener errs ir. interpreting the speakers intention 
(e.g., the speaker may be INFORM ing the listener that the blue 
cap fits around the end of the tube but the listener might 
interpret the utterance as a REQUEST to actually place the cap 
around the end of the tube). In both cases it is the effect of 
the speech act that causes the trouble since it influences what 
the listener will do (i.e., determine the intended responses). 
Finally, communicative knowledge can cause mistakes and confusion 
if the listener and speaker differ on the goal (e.g., the 
listener might think the speaker is clarifying previous 
information when, in fact, the speaker is adding new 
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information); They will feel they are communicating at cross 
purposes- -leading to frustration. 
2 . 2 Instances of Miscommunication 

In this section we will present evidence that people do 
miscommunicate and yet they often manage to repair reference 
failures. We will look at specific forms of miscommunication and 
describe ways to detect them and will demonstrate ways for 
resolving some miscommunication problems. 

There are many ways hearers can get confused during a 
conversation. Figure 2 outlines some of them that were derived 
from analyzing the water pump protocols. We will only discuss 
referent confusion in this paper. The othei forms of confusion- - 
Action, Goal, and Cognitive Load- -are described in Goodman, 
(1982, 1984). Another categorization of confusions that lead to 
conversation failure can be found in Ringle and Bruce (1981). 
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Referent confusion occurs when the listener is unable to 
determine correctly what the speaker is referring to. It may 
occur when the descriptions in the utterance are ambiguous or 
imprecise, when there is confusion batwoen the speaker and 
listener about what the current focus or context i s> or when the 
descriptions are either incorrect or incompatible with the 
current or global context. 

This section defines and illustrates many of the confusions 
through numerous excerpts. Each excerpt has marked in 
parentheses the communication that w*s used in the excarpt (face- 
to-face, over the telephone, and so forth). A description about 
the collection of these excerpts can be found in (Cohen, 1984): 
Each bracketed portion of the excerpt explains what was occurring 
at that point in the dialogue. 

Erroneous specificity . A speaker's overspecific or 
underspecific descriptions can lead to mistakes on the part of 
the listener even though, technically, nothing is wrong with the 
description. 

A request is overspecific if extra details are given that 
seem obvious to the listener (Grosz, 1978). Since the listener 
would not: expect the speaker to provide him with obvious details, 
he might think that he had done something incorrectly as the task 
seemed easier than the one apparently described by the speaker. 3 
For example, in Excerpt 2, S's description of the bubbled piece 
(i.e., the AIRCHAMBER) is overspecific because it supplies many 
more features than needed to identify the piece. The extra 
description in Lines 15 to 17 confused the listener who appeared 
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to have correctly identified the piece by Line 13 but ended up 
taking the wrong one when ths expert kept addiSg more details. 
See Excerpt 10 in the section on bad analogies for other related 
examples of overspecif icity . 

Excerpt 2 (Telephone) 

S : 1 . Okay? 

2 . Now you have two devices that 

3. are clear plastic 

[J picks up MAINTUBE and SPOUT] 

J: 4. Ukay. 

S : 5 . One of them has two openings 

6. on the outside with threads on 

7. the end, and its about five 

8 . inches long . 

[J rotates MAINTUBE confirming 
S 's description] 





9. 


Do you see chat? 




J: 


10. 


Yeah. 




S: 


11. 


Okay , 






12. 


the dther_ one is 


a bubbled 




13. 


piece with a blue 


base on it 




14. 


with one spout. 










[J looks at 




15. 


Do you see it? 





16. About two inches long. 

[J picks up STAND and drops 

MAINTUBE] 

±7. Both of these are tubular. 

[J puts down SPOUT] 



J: 18. Okay. 

19. not the bent one. 



[J puts down SPOUT] 
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Ambiguous descriptions are underspecif led and can cause 
confusion about the referent. Excerpt 3 below illustrates a case 
where the speakers description does not provide enough detail 
to prune the set of possible referents down to one. 

Excerpt 3 (Face-to-Face) 

S: 1. And now take: the little red 

2. peg, 

[P takes PLUG] 

3 . Yes , 

4. and place it in the hole at the 

5. green end, 

[P starts to put PLUG into 
OUTLET? of MAINTUBE] 

6 . nc 

7. the— in the green thing 

[P puts PLUG into green part of 
PLUNGER] 

P: 8. Okay. 

In Lines 4 and 5, S describes the location to place a peg into a 
hole by giving spatial information. Since the location is given 
relative to another location by "in the hole at the geeen end," 
it defines a region where the peg might go instead of a specific 
location. In this particular case, there are three possible 
holes to choose from thac are near the ^reen end. The listener 
chooses one--the wrong one — and inserts the peg into it. Because 
this dialogue took place face to face, S is able to correct the 
ambiguity in Lines 6 and 7. 

An underspecified description can be imprecise in many 
possible ways. It may consist of features that do not readily 
apply or that are inappropriate in the domain. In Line 3, 
Excerpt 4, the feature "funny" has no meaning to the listener 
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here. It is not until A provides a fuller description in Lines 5 
to 8 that E is able to select the proper piece. 

A description may use imprecise feature values. For 
example, one could use an imprecise head noun coupled with few or 
no feature values (and context alone does not necessarily suffice 
to distinguish the object). In Excerpt 5, Line 9, "attachment- 
is imprecise because all objects in the domain are attachable 
parts. The expert's use of "attachment" was most likely to 
signal the action the apprentice can expect to take next. The 
use of the feature value "clear" provides little benefit either 
because three clear, unused parts exist. The size descriptor 
"little" prunes this set of possible referents down to two 
contenders. Another use of imprecise feature values occurs when 
enough feature values are provided but at least one is too 
imprecise. In Excerpt 6, Line 3, the use of "rounded" to 
describe the shape does not sufficiently reduce the set of four 
possible referents (though, in this particular instance, A 
correctly identifies it) because the term is applicable to 
numerous parts. 4 A more precise shape descriptor such as "bell- 
shaped" or "cylindrical" would have be^n beneficial to the 
listener. 

Excerpt 4 (Telephone) 
E: 1. All right. 

2 . Now . 

3. There's another funny little 

4. red thing, a 

[A is confused, examines both 

NOZZLE and SLIDEVALVE] 
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5. little teehy red thing that's 

6. some- -should be somewhere oh 

7. the i desk, that has urn- -there's 

8. like teeth on one end. 

[E takes SLIDEVALVE] 



A: 


9 . 


Okay . 


E: 


10. 


It's a funny- loo --hollow, 




ii. 


hollow projection on one end 




12. 


and then teeth on the other. 






Excarpt 5 (Teletype) 


A: 


1. 


t^tke thp T »- fi t*Vinntr t.tt t-h t-1na 

k •* IVC 1-CU UllXllg WlLII LU6 




2. 


prongs on it 




3 . 


emu nc it onto cne otner hole 




4. 


of the cylinder 




5. 


so that the prongs are 




6. 


sticking out 


R: 


7: 


ok 


A: 


8. 


now take the clear little 




9. 


attachment 




10; 


and put on the hole where you 




11. 


just put the red cap on 




12. 


make sure it points 




13. 


upward 


R: 


14. 


ok 






Excerpt 6 (Teletype) 


S: 


1. 


ok, 



2. put the red nozzle on the outlet 

3. of the rounded clear chamber 

4. ok? 

A: 5. got it. 



Improper focus . Earlier we talked about focus and problems 
that occur due to it. In this section, we discuss how misfocus 
can cause misreference. Focus confusion can occur when the 
Speaker sets up one focus and then proceeds with another, without 
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giving the listener aiiy indication of the. switch. The opposite 
phenomenon can also happen- -the listener may feel that a focus 
shift has taken place whon the Speaker actually never intended : 
one. These really are very similar- -one is viewed more strongly 
from the perspective of the speaker and the other from thi 
listener. 

Excerpt 7 below illustrates an instance of the first type of 
focus confusion. In the excerpt, t>.e speaker (S) shifts focus 
without notifying the listener (P) of the switch. As the excerpt 
begins, P is holding the TUBEBASE. S provides in tines I to 16 
instructions for P to attach the CAP and the SPOUT to OUTLET! 
and 0UTLET2, respectively, on the MATNTUBE. When P successfully 
completes these attachments, S switches focus in Lines 17 to 20 
to the TUBEBASE assembly and requests P to screw it on to tbr 
bottom of th-i MAINTUBE. While P completes the task, S realizes 
she left but a step in the assembly- -the placement of the 
SLIDEVALVE into OUTLET 2 of the MAINTUBi before the SPOUT is 
placed over the same outlet. S attempts to correct her mistake 
by requesting P to remove "the plas'' 5 piece in Lines 22 and 23. 
Since S never indicated a shift in focus from r:he TUBEBASE btcfc 
to the SPOUT, P interprets 'the plas" to refer to the TUBEBASE. 

Excerpt 7 (Face-to-Face) 

S: 1. And place 

2. the blue cap that's left 

[P takes CAP] 

3. oft u.a .-jide holes that are 

4. on the cylinder, 

[P lays down TUBEBASE] 

5. the side hole that is farthest 

6 . from the green end . 

[P puts CAP on 0UTLET1 of MAINTUBE] 

P: 7. Okay. 
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S: 8. And take the nozzle- looking 
9. p.iec^, 

[P grabs NOZZLE] 

10. no 

11 I mean tbe clear plastic one, 

[P takes Sl'OUT] 

12. and place it on the other hole 

[P identifies OUTLET? of 21AINTUBE 1 

13. that's left, 

14. so that nozzle points away 

15 . from the 

[P installs SFJUT on OUTLET2 of 
tfAINTUBE] 

16. right. 
P: 17. Okay. 
S: 18. Now 

19. take the 

20. cap base thing 

[P takes TUBEBASE] 

21. and screw it onto the bottom, 

[P screws TUBEBASE on MAINTUBE] 

22. ooops f 

[S realizes she has forgotten co 
have P put SLIDEVALVE 
into OUTLET2 of 
MAINTUBE] 

23. un-undo the plas 

[P starts to take TUBEBASE off 
MAINTUBE] 

24. no 

25. the clear plastic thing that I 

26. told you to put on 

[P removes SPOUT] 

27. sorry. 

28. And place the little red thing 

[P takes SLIDEVALVE] 
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29. in chere first, 

j"? inserts SLIDEVALVE into OUTLET2 
„ n _ of MAINTUBEj 

30. it fits 'oosely in there. 

Excerpt 8 below demonstrates the focus confusion that occurs 
when the speaker (S) sets up one focus --the MA1NTU3E, th~ corrrect 
focus in this case- -but ther proceeds in such a manner that the 
listener (J) thinks a focus shift to anocl-r piece, uhe TUBEBASE. 
has o-cuired. Thus, Line 15, "a bottom hole," refers co "tbo 
lower side hole in the MAlhTUBE" for S and "the hole in the 
TUBEBASE" for J. J has no *ay of realizing that he has focused 
incorrectly unless the description as he interprets it doesn't 
have a real world correlate (here something does satisfy the 
description so J doesn't sense any problem) or if, later in th* 
exchange, a conflict arises due to the mistake (e.g., a requested 
action cannot be performed). In Line 31, J inserts a piece into 
the wrong hole because of the misunderstanding in Line 15. Line 
31 hints that J may have become suspicious that an ambiguity 
existed somewhere in the previous conversation but since the task 
appeared to be successfully completed (i.e., the red piece fit 
into the hole in the base), and since S did not provide any 
clarif ication, he assumed he was correct. 

Excerpt 8 (Telephone) 

S : 1 . Urn now . 

2. Now we're getting a little 

3. more difficult. 

J : 4 . ( laughs ) 

S: 5. Pick out the large air tubs 

[J picks up STAID] 
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6. that has the plunger In it. 

[J puts down STAND, takes 

assembly] 

J: 7. Okay. 

S: 8. And set it cn its base, 

[ J puts down MAINTUBE , 

standing vertically, on 
the TABLE] 

9. which is blue now, 
10. right? 

[J has shifted focus to the 
TUBEBASE] 

J: 11. Yeah. 

S: 12. Ease is blue. 

13. Okay, 

14. .Now 

15. You've got a bottom hole still 

16. to be filled, 

17. correct? 

J: 18. Yeah. 

[J answe?:s this with MAINTUBE still 
sitting on the TABLE; he 
shows no indication of 
what hole he tiinfc: is 
meant --the one on the 
MAINTUBE, OUTLET2, or the 
one in the TUBEBASE] 

S: 19. Okay^ 

20. You have one red piece 

21 . remaining? 

[J picks up MAINTUBE assembly and 
looks at TUBEBASE, 
rotating the MAINTUBE so 
that TUBEBASE is pointed 
up, and sees the hole in 
it; he then looks at the 
SLIDEVALVE] 

J: 22. Yeah. 
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S: 23. 

24. Take that red piece; 

[J takes SLIDEVALVE] 

25. It's got four little feet on 

26. it? 
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J: 27. Yeah. 

S: 28. And put the small end into 

29. that hole on the air tube- - 

30. on the big tube. 

J: 31. On the very bottom? 

[J starts to put it into tho bottom 
hole of I'iJJiEBASE- - though 
he indicates he is unsure 
of himself] 

S: 32. On the bottom. 
33. Yes. 

Misfocus can also occur when the speaker inadvertently fails 
to distinguish the proper focus because he did not notice a 
possible ambiguity- or when, through no fault of the speaker, the 
listener just fails to recognize a switch in focus. Excerpt 8 
above is an example of the first type because S failed to notice 
that an ambiguity existed since he never explicitly brought the 
TUBEBASE either into or out of focus. He just assumed that J had 
the same perspective as he had- -a perspective in which there was 
no ambiguity. 

Wrong context. Context differs from focus. The context of 
a portion of a conversation is concerned with the intention of 
the discussion and with the set of objects relevant to that 
discussion, though not attended to currently. Focus pertains to 
the elements which are currently being attended to in the 
context. For example, two people can share the same context but 
have different focus assignments within it- -we are both talking 
about the water pump but you are describing the MAINTUBE and I am 
describing the AIRCHAMBER. Alternatively, we could just be using 
different contexts--! think you are talking about taking the pump 
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apart but you are talking about replacing the pump with new 
parts; in both cases we may be sharing the same focus --the pump-- 
but our contexts are totally different from one another. 6 The 
kinds of misunderstandings that can occur because of context 
inconsistencies are similar to those for focus problems: (1) the 
speaker might set up or use one context for a discussion and then 
proceed in anothei one without letting the listener know of the 
change, (2) the listener Say feel that a change in context has 
taken place when in fact the speaker never intended one, or (3) 
the listener may fail to recognize that the speaker has indicated 
a switch in context. Context affects reference identification 
because it helps define the set of available objects that are 
possible contenders for the referent of the speaker's 
descriptions. If the contexts of the speaker and listener 
differ, then misreference may result. 

Bad analogy. An analogy (see Centner, 1980, for a 
discussion) is a useful way to help describe an object by 
attempting to be more precise by using shared past experience and 
knowledge- -especially shape and functional information. If that 
past experience or knowledge doesn't contain the information the 
speaker assumes it does, then trouble occurs. Thus, an 
additional way referent confusion can occur is to describe an 
object using a poor analogy. 

An analogy can be improper for several reasons. It might 
not be specific enough- -confusing the listener because several 
potential referents might conform. Alternatively, the analogy 
may fail because it is too difficult to discover a mapping 
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between the analogous object and something in the environment. 
In Excerpt 9, J at first has trouble correctly satisfying A's 
functional analogy "stopper" in "the big blue stopper," but 
finally selects what he considers to be the closest match to 
"stopper." The problem for J was that A's functional analogy was 
not specific enough. It would have been better to use "cap" 
instead of "stopper." 

Excerpt 9 (Telephone) 

A: 1. Okay. Now, 

2. take the big blue 

3. stopper that's laying around 

[J grabs AIRCHAMBER] 

4. ..^ and take the black 

5. ring-- 

J: 6. The big blue stopper? 

[J is confused and tries to 

communicate it to A; he 
is holding the AIRCHAMBER 
here] 

A: 7. Yeah, 

8. the big blue stopper 

9. and the black ring. 

[J drops AIRCHAMBER and takes the 
O-RING and the TUBEBASE] 

In other cases the analogy might be too specific and would 
confuse the listener because none of the available referents 
appear to fit it. In Line 8 of Excerpt 7, "nozzle -looking" is 
poor because the object being referred to actually is an elbow- 
shaped spout and not a nozzle. The "nozzle -looking" part of the 
description convinced the listener that what he was looking for 
was something identified by the typical properties of a nozzle 
(which is a small tube used as an outlet). However, sometimes 
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when an object is a clear representative of a specified analogy 
class, the apprentice will not think it is the intended referent. 
He assumes that the expert would just directly describe the 
object as a member of the class and not bother to form an 
analogy. Hence, the apprentice may very well ignore the best 
representative of the class for some less obvious exemplar. 
Given the case just mentioned, it is therefore better to say 
"nozzle" instead of "nozzle- looking." In Excerpt 10, the 
description "hippopotamus face shape" in Lines 2 and 3, and 
"champagne top" in Line 9, are too specific and the listener is 
unable to find something close enough to match either of them. 
He can't discover a mapping between the object in the analogy and 
one in the real world (a discussion on discovering such mappings 
can be found in Centner, 1980). In fact, when this excerpt was 
played back to one listener, he was so overwhelmed by M's 
descriptions, that he exclaimed "What!" when he heard them and 
was unable to proceed. 

Excerpt 10 (Audiotape) 

M: 1. take the bright pink flat 

2. piece oj. hippopotamus face 

3. shape piece of plastic 

4. and you notice that the two 

5. holes on it 

- [M is trying to refer to BASEVALVE1 

b . match 

7. along with the two 

8. peg holes on the 

9. champagne top sort of 

10. looking bottom that had 

11. threads on it 

[M is trying to refer to TUBEBASE] 
Description incompatibility . Descriptions incompatible with 
the scene can also lead to confusion. A description is 
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incompatible when it does not agree with the current state of the 
world: (1) when one or more of the specified conditions, i;e., 
the feature values, do not satisfy any of the pieces; (2) when 
one or more specified constraints do not hold (e.g., saying "the 
loose one" when all objects are tightly attached); or (3) if no 
one object satisfies all of the features specified in the 
description. In Lines 7 and 8 of Excerpt 10 above, M's 
description of "the two peg holes" leads to bewilderment for the 
listener because the "champagne top sort of looking bottom that 
had threads On it" (i.e., the TUBEBASE) has no holes in it. M 
actually meant "two pegs." 
2 • 3 Detecting Miseommunication 

Fart of our research has been to examine how a listener 
discovers the need to repair an utterance or description during 
communication. The incompatibility of a description or action 
with the scene is one signal of possible trouble. The appearance 
of a goal incompatibility such as an obstacle or redundancy that 
blocks one from achieving a goal is another indication of a 
potential problem. 

Description and action incompatibility. As we pointed out 
earlier, there are three kinds of possible i compatibility with 
the scene- -description, action and goal. The strongest hint that 
there is a description incompatibility occurs when the listener 
finds no real world object to correspond to the speaker's 
description (i.e., referent identification fails). This can 
occur when (1) one or more of the specified feature values in the 
description are not satisfied by any of the pieces (e.g., saying 
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"the orange cap" when none of the objects are orange); (2) when 
one or more specified constraints do not hold (e.g., saying "the 
red plug that fits loosely" when all the red plugs attach 
tightly); or (3) if rid one object satisfies all of the features 
specified in the description (i.e., there is, for each feature, 
an object that exhibits the specified feature value, but no one 
that exhibits all the values). 

An impossible reference could indicate an earlier action 
error (e.g., two parts were put together that never should have 
been). An action incompatibility problem is likely if (1) the 
listener cannot perform the action specified by the speaker 
because of some obstacle; (2) the listener performs the action 
but does not arrive at its intended effect (i.e., a specified or 
default constraint isn't satisfied); or (3) the current action 
affects a previous action in an adverse way, yet the speaker has 
given no sign that this side effect is important. Action 
incompatibility might indicate an earlier misreference (e.g., you 
chose the wrong part and used it in an earlier action). 

Goal obstacle . A goal obstacle occurs when a goal (or 
subgoal) one is trying to achieve is blocked. This can result in 
confusion for the listener because in general listeners do not 
expect speakers to give them tasks that cannot be achieved. 
Often, though, it points out for the listener that some 
miscommunication, such as misreference, has occurred 

Goal redundancy . Goal redundancy occurs when the requested 
goal (or subgoal) is already satisfied. This is a simple kind of 
goal obstacle where the goal to be fulfilled is blocked because 
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it is already true and Sothlng has to be done to get around it. 
However, it can lead to confusion on the part of listeners 
because they may suspect that they misunderstood what the speaker 
has requested since they wouldn't expect a reasonable speaker to 
request them to perform an already completed action. It provides 
a hint that raisconununication has occurred. 

3. Repairing Reference Failure*; 

3 . 1 Introduction 

When confusions do occur, they must be resolved if the task 
is to be performed. This section explores the problem of fixing 
reference failures . 

Reference identification is a search process where a 
listener looks for something in the world that satisfies a 
speaker's uttered description. A computational scheme for 
performing such identifications has evolved from work by other 
artificial intelligence researchers (see Grosz, 1977; Hoeppner, 
Chris taller, Marburger, Morik, Nebel, O'Leary, & Wahlster, 1983). 
That traditional approach succeeds if a referent is found and 
fails if no referent is found (see Figure 3(a)). However, a 
reference identification component must be more versatile than 
those previously constructed. The excerpts above show that the 
traditional approach is inadequate because people's real behavior 
is much more complex. In particular, listeners often find the 
correct referent even when the speaker's description does not 
describe any object in the world. For example, a speaker could 
describe a turquoise block as the "blue block." Most listeners 
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would go ahead and assume that the turquoise block was the one 
the speaker meant since turquoise end blue are similar colors; 

A key feature to reference identification is "negotiation" 
which, in reference identification, comes in two forms. First, 
it can occur between the listener and the speaker. The listener 
can step back, expand greatly on the speaker's description of a 
plausible referent, and ask for confirmation that he has indeed 
found the correct referent.. For example, a listener could 
initiate negotiation with "I'm confused. Are you talking about 
the thing that is kind of flared at the top? Couple inches long. 
It's kind of blue." Second, negotiation can be with oneself. 
This self -negotiation is the one that we are most concerned with 
in this research. The listener considers aspects of the 
speaker's description, the context of the communication, his own 
abilities, and ocher relevant sources of knowledge. He then 
applies that deliberation to determine whether one referent 
candidate is better than another or, if no candidate is found, 
what are the most likely places for error or confusion. Such 
negotiation can result in the listener testing whether or not a 
particular referent works. For example, linguistic descriptions 
can influence a listener's perception of the world. The listener 
must ask himself whether he can perceive one of the obj-cts in 
the world the way the speaker described it. In some cases, th* 
listener may overrule parts of the description because he cannot 
perceive it the way the speaker described it. 

To repair the traditional approach we have developed an 
algorithm that captures for certain cases the listener's ability 
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to negotiate with himself for a referent, it can search ^ r a 
referent and\ if it doesn't find one, it can try to find possible 
referent candidates that might work, and then loosen the 
speaker's description using knowledge about the speaker, the 
conversation, and the listener himself. Thus, the reference 
process becomes multi-step and resumable. This computational 
model, which we call "FWIM" for "Find What 1 Mean," is more 
faithful to the data than the traditional model (see Figure 3(b)). 
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Figure 3: Approaches to reference identification 



One means of making sense of a failed description is to 
delete or replace portions of it that cause it not to match 
objects in the hearer's world. In our program we are using 
"relaxation" techniques for this. Our reference identification 
module treats descriptions as approximate. It relaxes a 
description in order to find a referent when the literal content 
of the description fails to provide the needed information. 
Relaxation, however, is not done blindly but is modelled on a 
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person's behavior. We have developed a computational model that 
can relax aspects of a description using many of the sources of 
knowledge used by people. Relaxation then becomes a form of 
communication repair (in the ^tyle of the work on repair theory 
found in Brown & VanLehn , 1980). 

3.2 The Referent Identifie r and Relaxation Componen t 

When a description fails to denote a referent in the real 
world properly, it is possible to repair it by a relaxation 
process that, ignores or modifies parts of it. Since a 
description can specify many features of an object, and relaxing 
in different orders could yield matches to different objects, tie 
order in which parts of it are relaxed is crucial. There are 
several kinds of relaxation possible. One can ignore a 
constituent, replace it with a related value, or change focus 
(i.e., consider a different group oi objects). This section 
describes the overall relaxation component of the referent 
identifier and how it draws on knowledge sources about 
descriptions and the real world as it tries to relax an errorful 
description and find one for which e. referent can be identified. 

3.2.1 Find a refer ent using a r eference m echanism . 
Identifying the referent requires finding an element in the world 
that corresponds to the speaker's description (where every 
feature specified in the description is present in the element in 
the world but not necessarily vice versa). This process 
corresponds to the technique employed in the traditional 
reference r echanism. The initial task is to determine whether or 
not a search of the (taxonomic) knowledge base that we use to 
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model the world is necessary. For example, in the water pump 
domain, the reference component should not bother searching-- 
unless specifically requested to do so- -for a referent for ' 
indefinite noun phrases (which usually describe new or 
hypothetical objects) or extremely vague descriptions (which are 
ambiguous because they do not clearly describe an object since 
they are composed of imprecise feature values). A numbpr of 
aspects of discourse pragmatics can be used in that 
determination. For example, the use of a deictic in a definite 
noun phrase, such as "this X" or "the last X," hints that the 
object was either mentioned previously or that it probably was 
evoked by some previous reference, and that it is searchable. We 
will not examine such aspects any further in this paper. 

The knowledge base contains linguistic descriptions and a 
description of the listener's visual scene. In our 
implementation and algorithms, we assume it is represented in 
KL-One (Brachmau. 1977), a system for describing taxonomic 
knowledge. KL-One is composed of CONCEPTS, ROLEs on concepts, 
and links between them. A CONCEPT denotes a set, representing 
those elements describee" by it. A SUPERC link ("—>»») i s use d 
between concepts to show set: inclusion. It defines a property 
cared "subsumption" that specifies that the set denoted by one 
concept is included iii the other. For example, consider Figure 4. 
The SurerC from Concept B to Concept A is like stating B C A 
for two sets A and B. £n INDIVIDUAL CONCEPT is used to guarantee 
that the set specified by a concept denotes a singleton set. The 
Individual Concept D shown in the figura i^ defined to be a 
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unique member of the set specified by Concept C. ROLEs on 
concepts are like attributes or slots in other knowledge 
representation languages. They define a functional relationship 
between the concept and other concepts that specifies a 
restriction on what can fill a particular slot. 




Individual 
Concept 



Figure 4; A KL-One Taxonomy 
Once a search of the knowledge base is considered necessary, 
a reference search mechanism is invoked. The search mechanism 
uses the KL-One Classifier (Lipkis, 1982) to search the knowledge 
base taxonony and is constrained by a focus mechanism based on 
the one developed by Grosz (1977). The Classifier's purpose is 
to discover all appropriate subsuir.ption relationships between a 
newly formed description and all other concepts in a given 
taxonomy. With respect to reference, this means that 
descriptions of all possible referents of the description will be 
subsumed by the description after it has been classified into the 
knowledge base taxonomy. If more than one candidate referent is 
below (when a concept A is subsumed by B, we say A is "below" B) 
the classified description, then, unless a quantifier in the 
description specified more than one element, the speaker's 
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description is ambiguous. If exactly one concept is below it, 
then the intended referent is assumed to have been found. 
Finally, if no referent is found below the classified 
description, the relaxation component can be invcked. Prior to 
actually using the relaxation component, FWItf checks to see if 
the problem resides not with the description, but due to 
pragmatic issues. We will only consider the no reference case in 
the rest of the paper. 

3.2.2 Collect votes for or against relaxing the 
description . If the referent search fails, then it is necessary 
to determine whether the lack of a rtferent for a description has 
to do with the description itself (i.e., reference failure) or 
outside forces. For example, an external probl< 3 due to outside 
forces may be with the flow of the conversation and the speaker's 
and listener's perspectives on it; it may be due to incorrect 
attachment of a modifier; it may be due to the action requested; 
and so on. Pragmatic rules are invoked to decide whether or not 
the description should be relaxed. For example, aspects on 
focus, metonomy and synecdoche are considered to see if they 
affected the referent search. These rules will not be discussed 
here; we will assume that the problem lies in the speaker's 
description . 

3.2.3 Perform the relaxation of the description . If 
relaxation is demanded, then the system must (1) find potential 
referent candidates, (2) determine which features in the 
speaker's description to relax and in what order, and use those 
to order the potential candidates with respect to the preferred 
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ordering of features, and (3) determine the proper relaxation 
technique to use and apply them to the description. 

Find potential referent c andidates . Before relaxation takes 
place, the algorithm looks for potential candidates for referents 
(which denote elements in the listener's visual scene). These 
candidates are discovered by performing a "walk" in the knowledge 
base taxonomy in the general vicinity of the speaker's classified 
description as partitioned by the focusing mechanism. A KL-One 
partial matcher is used to determine how close the candidate 
descriptions found during the walk are to the speaker's 
description. The partial matcher generates a numerical score to 
represent how well the descriptions match (after first generating 
scores at the feature level to help determine how the features 
are to be aligned and how well they match). This score is based 
on information about KL-One (e.g., the subsumption relationship 
between or the equality of two feature values) and does not take 
into account any information about the task domain. The set of 
best descriptions returned by the matcher (as determined by some 
cutoff score) is selected as the set of referent candidates. The 
ordering of features and candidates for relaxation described 
below takes into account the task domain. 

Order the features and candidates for relaxation. At this 
point the reference system inspects the speaker's description and 
the candidates, decides which features to relax and in what 
order, 7 and generates a master ordering of features for 
relaxation. Once the features are in order, the reference system 
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uses that ordering to determine the order in which to try 
relaxing the candidates. 

We draw primarily on sources of linguistic, pragmatic, 
discourse, domain, perceptual, and hierarchical knowledge, as 
well as trial and error during this repair process. A detailed 
treatment of all of them can be found in Goodman (1983-84) and 
Sidner, Goodman, Haas, Moser, Stallard, and Vilain (1984). These 
knowledge sources are consulted to determine the feature ordering 
for relaxation. We represent information from each knowledge 
source as a set of relaxation rules. Most of the rules were 
motivated by the problems illustrated in the protocols. They are 
written in a PROLOG-like language. Figure 5 illustrates one such 
linguistic knowledge relaxation rule. Speakers typically add 
more important information at the end of a description where it 
is separated from the main part and, thus, provides more 
emphasis. The rule in Figure 5 simply embodies the fact that 
relative clauses are found at the end of noun phrases, while 
adjectives are not and, thus, the features of a description that 
are provided adjectivally should be relaxed before those provided 
by a relative clause. However, a more general and more 
applicable rule is that information presented at the end of a 
description is usually more prominent. 

Each knowledge source produces its own partial ordering of 
features which are then integrated together. For example, 
perceptual knowledge may say :o relax color. However, if the 
color value was asserted in a relative clause, linguistic 
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Relax the features in the speaker's description 
in the order: adjectives, then prepositional 
phrases, and finally relative clauses and 
predicate complements. 

E.g., 

Relax-Feature-Before (v I,v2) 

^Ob|ectI)escr(d5,Fea^rebescriptor(v l ), 
Fea tureDescrl p tor ( v 2 ), 
FeaturslnDescrlption(v l,a), 
FcaturelnDescription(v2,<l)] 
Equal (syntactic-form (v l V'ADJ"), 
Equal(s7ntacUc-form(v2,d)/-REL-c'LS") 



Figure 5: A sample relaxation rule 

knowledge *ould rank color lower, i.e., placing it later in the 
list of things to relax. 

Since different knowledge sources generally produce 
different partial orderings of features, this can lead to a 
conflict over which features to relax. It is the job of the best 
candidate algorithm to resolve these disagreements among 
knowledge sources and to order the referent candidates, C-^, Cg, 

. . C R , so that relaxation is attempted on the best 
candidates first, the ones that conform best to a proposed 
feature ordering. To start, the algorithm examines candidates in 
pairs and the feature orderings from each knowledge source. For 
each candidate the algorithm scores the effect of relaxing 

the. speaker's original description to C ±9 using the feature 
ordering from one knowledge source. The score reflects the goal 
of minimizing the number of features relaxed while trying to 
relax the features that are "earliest" in the feature ordering. 
It repeats its scoring of C i for each knowledge source, and sums 
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up its scores to form C^s total score. The C^s are then 
ordered by that score. 

Figure 6 provides a graphic illustration of what the best 
candidate algorithm does. A set of objects in the real world are 
selected by the partial matcher as potential candidates for the 
referent These candidates are shown across the top of the 
figure. The lines on the right side of each box correspond to 
the set of features that describe that object. The speaker's 
description is represented in the center of the figure. The set 
of specified features and their assigned feature value (e.g., the 
pair Color-Maroon) are also shown there. A set of partial 
orderings are generated that suggest which features in the 
speaker's description should be relaxed first- -one ordering for 
each knowledge source (shown as "Linguistic," "Perceptual," and 
"Hierarchical" in the figure). These are put together to form a 
directed graph that represents the possible, reasonable ways to 
relax the features specified in the speaker's description. This 
graph isn't actually built by the best candidate algorithm, but 
helps to illustrate here the consideration of all the partial 
orderings by the algorithm. Finally, the referent candidates are 
reordered using the information expressed in the speaker's 
description and in the directed graph of features. 

Defter m ine w hich relaxation m ethods to apply. Once a set of 
ordered, potential candidates is selected, the relaxation 
mechanism begins step 3 of relaxation; it tries to find proper 
methods to relax the features that have just been ordered 
(success in finding such methods "justifies" relaxing the 
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speaker's description to the candidate). It stops at the first 
candidate in the list of candidates to which methods can be 
successfully applied. 

Relaxation can take place with many aspects of a speaker's 
description: with complex relations specified in the 
description, with individual features of a referent, or with the 
focus of attention in the real world where one attempts to find a 
match. Complex relations specified in a speaker's description 
include spatial relations (e.g., "the outlet near the to£ of the 
tube"), comparatives (e.g., "the larger tube") and superlatives 
(e.g., "the longest tube"). These can be relaxed, as can simpler 
features of an object (such as size or color) that are specified 
in the speaker's description. 




Rounded 




Xaow|*lc« 5 our cu 
a -) Perceptual 
b -J Linguist tc 
c -> Hierarchical 



Ftaivitj — 
fi ~> Color 
fZ -> Shape 
fi r> Function 
f« -> Size 



Pmttiai oriirtnt of Jtmiutu 



£fl50l or f2}fci$ 

iJ?$*Jj$ *i2 or f3 or f<K 



"iht rounded 
maroon device 
thai is iitrfi" 



Sjxaktft 
Dttcriptwn 



Col or-Manoon 
Shape -Round* d 

- F unci I on-Dew I ce 

-Size-Large 





Dtrrcitd grajA of ftaiuttt for rt/ajradon 






Figure 6: Reordering referent candidates 
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Relaxation has a few global strategies that people can 
follow for each part of the description. They can (1) drop the 
errorful feature value from the description altogether, (2) 
weaken or tighten the feature value in a principled way, keeping 
its new value close to the specified one (e.g., movement within a 
subsumption hierarchy of features values), or (3) try some other 
feature value based on some outside information (e.g., knowing 
that people often confuse opposite word pairs such as using 
"hole" for "peg" as illustrated in Excerpt 10). 

Often the objects in focus in the real world implicitly 
cause other objects to be in focus (Grosz, 1977; Webber, 1978). 
The subparts of an object, for example, are reasonable candidates 
for the referent of a failing description and should be checked. 
At other times, the speaker might attribute features of a subpart 
of an object to the whole object (e.g., describing a plunger that 
is composed of a red handle, a metal rod, a blue cap, and a green 
cup as "the green plunger"). In these cases, the relaxation 
mechanism utilizes the part- whole relation in object descriptions 
to suggest a way to relax the speaker's description. 

These strategies are realized through a set of procedures 
(or relaxation methods) that are organized hierarchically. Each 
procedure relaxes its particular type of feature. For example, a 
Generate-Similar-Feature-Values procedure is composed of 
procedures like Generate-Similar-Shape-Values, Generate-Similar- 
Color-Values and Generate-Similar-Size-Values. Each of those 
procedures attempts to first relax the feature value to one 
"near" or somehow "related" to the current one (e.g., one would 
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prefer to first relax the color "red" to "pink" before relaxing 
it to "blue") and then, if that fails, to try relaxing it to any 
of the other possible values. 8 The effect of the latter case is 
really the same as if the feature was simply ignored. 
3.3 An Example of Misreference Resolution 

This section describes how a referent identification system 
can recover from a misreference using the scheme outlined in the 
previous section. For the purposes of this example, assume that 
the water pump objects currently in focus include the CAP, the 
MAINTUBE, the AIRCHAMBER and the STAND. Assume also that the 
speaker tries to describe two of the objects --the MAINTUBE and 
the AIRCHAMBER. 



DescrA: 

"...two devices that are clear plastic. 



PI DescrB: 

stand ^ One of them has two openings on the outside I j Mai 

with threads on the end. and its about five A J Tub 
inches long. ^ I 

_ DescrC: 
°* ^ The other one is a rounded piece with a 

turquoise base on it. 



DescrD: 
DescrE: 



y 



Both are tubular. Chamber 



The rounded piece fits loosely over..." 



The reference system can find a unique referent for the first 
object (described by DescrA, DescrB and DescrD) but not for the 
second (described by DescrA, DescrC, DescrD and DescrE). The 
relaxation algorithm, shown below, reduces the set of referent 
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candidates for the second one down to two. It, then, requires 
the system/listener to try out those candidates to determine if 
one, or both, fits loosely. The protocols exhibit a similar 
result when the listener uses "fits loosely" to get the correct 
referent (e.g., Excerpt 6 exemplifies where "fit" is used by the 
speaker to help confirm that the proper referent was found). Our 
system simulates this test by asking the user about the fit. 

Figure 7 provides a simplified and linearized view of the 
actual KL-One representation of the speaker's descriptions after 
they have been parsed and semantically interpreted. A 
representation of each of the water pump objects that are 
currently under consideration (i.e., in focus) is presented in 
Figure 8. Each provides a physical description of the object--in 
terms of its dimensions, the basic 3-D shapes composing it, and 
its physical features --and a functional description of the 
object. The first entry in each representation in Figure 8 
(shown in uppercase) defines the basic kind of entity being 
described. The words in mixed case refer to the names of 
features and the words in uppercase refer to possible fillers of 
those features from things in the water pump world. The 

ibpart" feature provides a place for an embedded description of 
an object that is a subpart of a parent object which can be 
referred to either on its own or as part of the parent object. 
The 'lOrientation" feature, used in the representations in Figure 
8, provides a rotation and translation of the object from some 
standard orientation, which provides a way to define relative 
positions such as "top," "bottom," or "side," to the object's 
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current orientation in 3-D space. Figure 9 shows the KL-One 
taxonomy representing the same objects. 

The first step in the reference process is the actual search 
for a referent i;. the knowledge base. In people, the reference 
identification process is incremental, i.e., the listener can 
begin the search process before he hears the complete 
description, as was observed in the videotape excerpts. We try 
to simulate this incremental nature in our algorithm, as is 
apparent from the placement of the first description in DescrD 
into the KL-One taxonomy shown in Figure 9. DescrD is 
incrementally defined by first adding DescrA--as shown in Figure 
10- -and then DescrB--as shown in Figure 12--to the taxonomy. The 
KL-One Glass if ier compares the features specified in the 
speaker's descriptions with the features for each element in the 
KL-One taxonomy that corresponds to one of the current objects of 
interest in the real worl i. Notice that some features are 
directly comparable. For example, the "Transparency" feature of 
DescrA and the "Transparency" feature of MAINTUBE are both equal 
to "CLEAR." All the other features specified in DescrA fit the 
MAINTUBE so the MAINTUBE can be described by DescrA. This is 
illustrated in Figure 11 where MAINTUBE is shown as a subconcept 
of DescrA. 
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DescrA: (DEVICE (Transparency CLEAR) 

(Composition PLASTIC)) 
DescrB: (DEVICE (Transparency CLEAR) 

(Composition PLASTIC) 

(Subpart (OPENING)) 

(Subpart (OPENING)) 

(Subpart 

(THREADS (Rel-Position IEND))) 

__ (Dimensions (Length 5.0))) 

DescrC: (DEVICE (Transparency CLEAR) 

(Composition PLASTIC) 

(Shape ROUND) 

(Subpart (BASE (Cdjbr TURQUOISE)))) 
DescrD: (DEVICE (Transparency CLEAR* 
I (Composition PLASTIC) 

(Subpart (OPENING)) 
(Subpart (OPENING)) 

(Subpart 

(THREADS (Rel-Position END))) 
(Dimensions (LENGTH 5.0)) 
(Analogical-Shape TUBULAR)) 
(DEVICE (Transparency CLEAR) 
(Composition PLASTIC) 
(Shape ROUND) 
(^nojpgicoj-shope TUBULAR) 
(Subpart (BASE (Color TURQUOISE)))) 
DescrE: (FIT-INTd 

(Outer (DEVICE (Transparency CLEAR) 
(Composition PLASTIC) 
(Shape ROUND) 

(AnologiCal-Shope TUBULAR) 
(Support 

(BASE (Color TURPUOISE))))) 

(Inner ._. .) 

(FitCondition LOOSE)) 

Figure 7: The speaker's descriptions 

STAND also is shown as a subconcept of DescrA: AIR CHAMBER is 
shown as a possible subconcept (with the dotted arrow) because 
DescrA mismatches with it on one of its subparts.^ Other 
features require in-depth processing- -that is outside the 
capability of the KL-One classifier- -before they can be compared. 
The OPENING value of "Subpart" in DescrB provides a good example 
of this. Consider comparing it to the "Subpart" entries for 
MAINTUBE shown in Figure 8. An OPENING, as seen in Figure 13, 
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(CAP (Color BLUE) 

(Cosjpoaitloa PLsSTIC) 
CAP (Traasporoacy OPAQUE) 

(DiBtnucn (Unfth 23) (Diaaatar 5) J 
(Orisatatioa (Rotation (0 0 0 0 90 0)) 

(Translation (0 0 0.0 00)))) 



MAIM 
TUBE 



( TUBE (Color VIOiXT) 

(Coapotiiun PLASTIC) 
(Tranaparancy CLEAR) 
(DtMOfiont (Ungta 4 125)) 

(Subpart (CYLINDER (Dimdiioqi (Un|tb 25) (DitMtir 2. 125)) 
(Orisatation (Rotation (0 0 6.0 0 0)j 

{Translation (0.0 0 0 3.75))) 

(Function WrLET-ATT AOftOXT-PO I f<T) )) 

(Subpart (CYLINDER (DiawiuiiOnv (LacLgth j.it (Diwtiir 1.0)) 
7U)«j»4y (Oriantatioi (Rotation (0000 0.0)) 

- (Tranalatinn (0 0 0 0 25))))) 

(Subpart (CYLINDER (Diatnaions (Un«tn 29) (DiiMtir I 125)) 

(Oriantatioo {Rotation (0.0 0.0 0 0)) 

fArsnats (Tranalation 0:0 0:0))) 

(FVBciioo THREAKPrATTAQtmr-POINT ))) 

(Subpart (CYLINDER (Diawnaions (LrttgtB 375) (DiaMtar a)) 
(Oriontatioa (RotJtio* (0-O 0 0 tO.0)) . . 

(Translation f0_0 5 3.0O))) 

(Function OUTLET- ATT AO*CNT-FO I NT J. ) ) 
(Subpart (CYLINDER < Di awn* ions (Lang th .375) {P mUT 9)) 
(Oriantatioo (Rotation (0.C 0 o 90. ")} 

(Tranalaliow 0 0 3 . «2S)) 
(Tunc I ion OU TLET- AT T AQ l aPt T" ¥1i I H+ ) ) ) ) 



Out lit t 



Out lit! 



CD 




(CONTAINER ( Diawnai ona ( LENGTH 2.75)) 
(Coapoat tion PLASTIC) 
(Subpart ( Km t SPHERE (Color VIOLET) 

(Tranaparoncy CLEAR) 
Cfumtwr (DiMhaiona (Dioaaatsr 1.0)) 

T *P (Oriontatioa (Rotation (CO 0 0 0.0)) 

l (Translation (0.0 0.0 2.25))))) 

(Subpart (CYLINDER (Color VIOLXT)- 

(TraatpcrtncT CLEAR) - 

Chamfr (Dta*nsiona (Un|tn 1.0) (Di an* tar 2 25)) 

8**v (Oriontatioa (Rotation (0.00O-0.0)) 

- .Translation (0.0 0.0 .375))))) 

(Subpart (CYLINDER. (Color BLUE) 

(TranaParaney OPAQUE )_ _ _ 

(Diawntionj (Un«tn 375) (Diaamar ! 25)) 

AIR (On ant at ion (Rotation. <0.0_0^0 0^0l)_ 

OtafflXR CTusaoar (Trans I at ion (*L 0 0 0 0,0 ) ) ) 

Bo tt mm (function CAP OUTLET-ATTAOMENT-POINT) 

(Subpart (CYLINDER (ColorBUIE) 

(Diatniioot iLo&fta .375) 

(Oiouotsr .5)) 

(Oriontatioa 

(Rotation (0.0 0.0 0.0)) 
(Translation (0.0 0.0 0.0))) 
(function 

OUTLET-ATTACWatirr-POINT ) ) ) ) ) 

(Subpart (CYLINDER (Color VIOLET) 

(Trenuparancy CLEAR) _. ._. 
Chawmir (Dsmmiooi (LaOftL 3) (Diaawtar 373)) 

Outlet (Oritatatian (Rotation 40.0 0.0 900)) 

(Translation < .825 .625 .825))) 

(function (RniXT^ATTAQiinrr-POINT)))) 




(TUBE (DiMnaiono (La Oft a 2 75)) ' 
(Coajpoaitioa PLASTIC) 
(Subpart (CYLINDER (Color BLUE) 

(TranapartncT CLEAR) _____ _ _ 

*"sp (Dismsiobs ( Long t a 2 25) (Dianwtar 375) ) 

(Orioatittoa (Rotation (0000 0.0)) 

STAND ._ (Translation ( 5 0 0 373))) 

(function OUTLET-ATTAOMDfT-POINT) ) ) 

(Subpart (CYLINDER (Color BLUE) 

(Tranipariaet CLXAR) - 

(DiSanaiooa (boagta 375) (Diaruir 1.0)) 
(Oriaotat ion (Rotation (0.0 0.0 0.0)) 

— (Translation (0.0 0 0 0.0))) 

<r^scxin*jRm_rr-ATYAiaia^^ ) ) ) 



Figure 8: the objects in focus 
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Figure 12: Adding DescrB to the taxonomy 
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I 

. 2D-0b ject } | f Outlet 




Attachmtnv 



Out jet 
(Attochment] 

.-Point 



Cylinder' 



3D-Object 



cylinder 



3D-End 



Figure 13: Attempt to match OPENING to CYLINDER' 



End 



2D-End 



Open 
2D-End 



is thought of primarily as a 2-D cross -section (such as a 
"hole"), while the two CYLINDER subparts of MAINTUBE are viewed 
as (3-D) cylinders that have the "Function" of being outlets, 
i.e., OUTLET - ATTACHMENT - POINTS . To compare OPENING and one of 
the cylinders, say CYLINDER, the inference must be made that both 
things can describe the same thing (similar inferences are 
developed in Mark, 1982). One way this inference can occur 
is by recursively examining the subparts of MAINTUBE (and their 
subparts, etc.), with the KL-One partial matcher until the 
cylinders are examined at the 2-D level. At that level, an end 
of the cylinder will be defined as an OPENING, ffitlk that 
examination, the MAINTUBE can be seen as described by DescrB. 
This inference process is illustrated in Figure 13. There the 
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partial matcher examines the roles Lip, Outletl, and 0utlet2 of 
MAINTUBE which represents its subparts and determines the 
following: 

A CYLINDER can have an End which is either a 2D- End (e.g., 
a lid or hole) or a 3D- End (e.g., a lip). 

A 2 D^End is either an 0PEN-2D-END (e.g., a hole) or a 
CLOSED - 2D- END (e.g., a lid on a can). 

An OPEN -2D -END is a kind of 0PEN-2D-0BJECT. 
These facts imply that OPENING can match any of the subparts Lip, 
Outletl, or 0utlet2 on MAINTUBE since those subparts are 
defined as cylinders that function as outlets (i.e., Outlet- 
Attachment - Po int s ) . 

DescrC poses different problems. DescrC refers to an object 
that is supposed to have a subpart that is TURQUOISE. The 
Classifier determines that DescrC could not describe either the 
CAP or STAND because both are BLUE. It also could not describe 
the MAINTUBE 10 or AIR CHAMBER since each has subparts that are 
either VIOLET or BLUE. The Classifier places DescrC as best it 
can in the taxonomy, showing no connection between it and any of 
the objects currently in focus. DescrD provides no further help 
and is similarly placed. This is shown in Figure 14. At this 
point, a probable misreference is noted. The reference mechanism 
now tries to find potential referent candidates, using the 
taxonomy exploration routine described in Section 3.2.3, by 
examining the elements closest to DescrD in the taxonomy and 
using the partial matcher to score how close each element is to 
DeccrD. 11 This is illustrated in Figure 15. The matcher 
determines MAINTUBE, STAND, and AIR CHAMBER as reasonable 
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candidates by aligning and comparing "their features to DescrD. 
Scoring DescrD to MAINTUBE: 
a TUBE is a kind of DEVICE; (>) 
the Transparency of each is CLEAR; (+) 
the Composition of each is PLASTIC; (+) 



a TUBE implies Analogical -Shape TUBULAR, which implies 
Shape CYLINDRICAL, which is a kind of Shape ROUND; (>) 

the recursive partial matching of subparts: A BASE is 
viewed as a kind of BOTTOM. Therefore, BASE in DescrD 
could match to the subpart in MAINTUBE that has a 
Translation of (0.0 0.0 0.0) i.e., Threads of 

MAINTUBE. However, they mismatch since color TURQUOISE in 

DescrD differs from color VIOLET of MAINTUBE. (-) 

Scoring DescrD to STAND: 

a TUBE is a kind of DEVICE ; (>) 

the Transparency of each is CLEAR; (+) 

the Composition of each is PLASTIC; (+) 



a TUBE implies Analogical -Shape TUBULAR, which implies 
Shape CYLINDRICAL, which is a kind of Shape ROUND; (>) 

the recursive partial matching of subparts: BASE in 
DescrD could match to the subpart in STAND that has a 
Translation of (0.0 0.0 0.0) i.e., Base of STAND. 
However, they mismatch since color TURQUOISE in DescrD 
differs from color of BLUE of STAND. (-) 

Scoring DescrD to AIR CHAMBER: 



a CONTAINER is a kind of DEVICE; (>) 

the Transparency of DescrD, CLEAR, matches the 
Transparency of Chamber Top . _ ChamberOutlet and GhamberBody 
of AIR CHAMBER, but mismatches the Transparency of 
"hamberBottom of AIR CHAMBER. Therefore, the partial match 
uncertain; (?) 

the Composition of each is PLASTIC; (+) 

the subparts of AIR CHAMBER have Shape HEMISPHERICAL and 
CYLINDRICAL which are each a kind of Shape ROUND; (>) 
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Figure 14: Adding DescrC and DescrD to the taxonomy 
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the recursive partial matching of subparts: BASE in 
DescrD could match to the subpart in AIR CHAMBER that has 
a translation of (0.0 0.0 0.0) -- i.e., ChamberBottom of 
AIR CHAMBER. However* they mismatch since color TURQUOISE 
in DescrD differs from color BLUE of AIR CHAMBER. (-) 

Figure 16 summarizes the scoring. A weighted, overall numerical 

score is generated from the scores shown there. 

The above analysis using the partial matcher provides no 

clear winner since the differences are so close, causing the 

scores generated for the candidates to be almost exactly the same 

(i.e., the only difference was in the score for Transparency). 

All candidates , hence , will be retained for now. 

At this point, the knowledge sources and their associated 

rules that were mentioned earlier apply. These rules attempt to 

order the feature values in the speaker's description for 

relaxation. First, we order the features in DescrD Using 

linguistic knowledge. Linguistic analysis of DescrD, "... are 

clear plastic ... a rounded piece with a turquoise base.... Both 

are tubular ... fits loosely over tells us that the 

features w-.re specified using the following modifiers : 

Adjective: (Shape ROUND) 

Prepositional Phrase : (Subpart (BASE (Clor TURQUOISE))) 
Predicate Complement: (Transparency CLEAR), 

(Composition PLASTIC), (Analogical -Shape TUBULAR) , (Fit LOOSE) 
Observations from the protocols (as described above) has shown 
that people tend to relax first those features specified as 
adjectives, then as prepositional phrases and finally as relative 
clauses or predicate complements. Figure 5 shows this rule. The 
rule suggests relaxation of DescrD in the order: 
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DescrD 



Maintubo 



Stand 



Air CHamber 



Ring': of role scores: 




iov - 

Correlation _ ~ K > * 


High 

Correlation 



Figure 16: Scoring DescrD to the referent candidates 

{Shape} < (Color, Subpart} 

< {Transparency, Composition, Analogical -Shape, Fit} 

The set of features on the left side of a "<" symbol is relaxed 

before the set on the right side. The order in which the 

features inside the braces, »» {...},'» are relaxed is not specified 

(i.e., any order of relaxation is alright). Perceptual 

information about the domain also provides suggestions. Whenever 

a feature has feature values that are close, then one should be 

prepared to relax any of them to any of the others (we call this 

the "clustered feature value rule"). Figure 17 illustrates a set 

of assertions that compose a data base of similar color values in 

some domain. The Similar-Color predicate is defined to be 

reflexive and symmetric but not transitive. In this example, 

since a number of the color pairs are very close, color may be a 

reasonable thing to relax (see Figure 18). The clustered color 



SuperC Composition Transparency Sfcape Subparts 
> + + > - 
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rule defined in Figure 19 would suggest such a relaxation. It 

requires that there are at least three objects in the world that 

have similar colors. It is meant as an exemplar for a whole 

series of rules (e.g., Clustered Shape Values, Clustered 

Transparency Values, and so on). Hierarchical information about 

how closely relatec one feature value is to another can also be 

used to determine what to relax. The Shape values are a good 

example as shown in Figure 20. A CYLINDRICAL shape is also a 

CONICAL shape, which is also a 3-D ROUND shape. Hence, it is 

very reasonable to match ROUNDED to CYLINDRICAL. All of these 

suggestions can be put together to form the order: 

{Shape, Color} < {Subpart} 

< { Transparency , Composition , 

Analogical- Shape, Fit} . 



Similar-Color (''BLUE'V'VIQIJET'')*- 
Similar-Color ( ' 'BLUE" 'TURQUO! SE' • )<- 
Similar-Color (''GBEEN^TUROUOISE" 
Similar-Color (''RED'Y'PINK")*-- 
Similar-Color ( ' 'RED' ' / 'MAROON' ' )<- 

Similar-Color ("RED* V'MAGENTA" )*- 
• • • 

Figure 17: Similar color values 

The referent candidates MAINTUBE, STAND, and AIR CHAMBER ca 
be . xamined and possibly ordered themselves using the above 
feature ordering. For this example, the relaxation of DescrD to 
any of the candidates requires relaxing their SHAPE and COLOR 
features. Since they each require relaxing the same features, 
the candidates cannot be ordered with respect to each other. 
Hence, no one candidate stands out as the most likely referent. 
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Colors o f 
Candidates 
& DescrD 



MainTube- violet 
Stand- blue 



Air Chamber- violet, blue 
DescrD- turquoise 



Retrieve those Similar-Color assertion!; 
in the dots base for the colors BLUE, 
VIOLET and TURQUOISE. 



Simi!or-ebrorf"OLUE'Y'VjOLET"^ 

Similor-ebibrr*6LUE*YTU^ 

Similor-Color("GREEN"/TURQUdlSE' - )^ 



Figure 18: Objects with similar colors 



One can relax a feature whose feature values 
arx clustered closely together before those of a 
non-clustered feature. 

Clustered eatureValues(COLOR,w) 
«-Feature(COLOR),Worl<i(w). 
Color Value(c D,ColorValue(c2),ColorVaiue(c3), 
Woridbt>j(pl,w),World^ 
Cojor<cl,61\^ 

Similar-Color(c l,c2),Similar-Color(c l,c3), 
Similar-Color Cc2,c5) 

Relax-Feature-BeforeCv l,v2) 

Clustered Fea tux e Values (f ea tu re ( y \ ), wh 
NOT(ClusteredFeatureValues(leature(v2),w)) 



Figure 19: The clustered color value rule 
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While no ordering of the candidates was possible, the order 
generated to relax the features in the speaker's description can 
still be used to guide the relaxation of each candidate. The 
relaxation methods mentioned at the end of the last section come 
into use here. Consider the shape values. Th: goal 5 s to see if 
the ROUND shape specified in the speaker's description is similar 
to the shape values of each candidate. 




Figure 20: Hierarchical shape knowledge 
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Generate-Similar-Snape-Values determines that it is reasonable to 
match ROUND to either the CYLINDRICAL or HEMISPHERICAL shapes of 
the AIR CHAMBER by examining the taxonomy shown in Figure 20 and 
noting that both shapes are below ROUND and 3D-R0UND. Notice 
that it is less reasonable to match CYLINDRICAL to HEMISPHERICAL 
since they are in different branches of the taxonomy. This holds 
equally true for the CYLINDRICAL shapes of the MAINTUBE and the 
STAND. Generate-Similar-Color-Values next tries relaxing trie 
Color TURQUOISE. The assertions Similar-Color ("BLUE," 
"TURQUOISE") <- and Similar-Color ("GREEN," "TURQUOISE") < are 
found as rules containing TURQUOISE. The colors BLUE and GREEN 
are, thus, the best alternates. 

Here, only two clear winners exist- -the AIR CHAMBER and the 
STAND- -while the MAINTUBE is dropped as a candidate since it is 
reasonable to relax TURQUOISE to BLUE or tc GREEN but not to 
VIOLET. Subpart, Transparency, Analogical -Shape, and Composition 
provide no further help (though, the fact that the AIR CHAMBER 
has both CLEAR and OPAQUE subparts could be used to put it 
slightly lower than the STAND whose subparts are all CLEAR. This 
difference, however, is not significant.). This leaves trial and 
error attempts to try to complete the FIT action specified in 
DescrE. The one (if any) that fits --and fits loosely- -is 
selected as the referent. The protocols showed that people often 
do just that- -reducing their set of choices down as best they can 
and then taking each of the remaining choices and trying out the 
requested action on them. 
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4 Conclusion 

Our goal in this work is to build robust natural language 
understanding systems, allowing them to detect and avoid 
miscommunication. The goal is not to make a perfect listener but 
a more tolerant one that could avoid many mistakes, though it may 
still be wrong on occasion. In this paper, we introduced a 
taxonomy of miscommunication problems that occur in expert- 
apprentice dialogues. We showed that reference mistakes are one 
kind of obstacle to robust communication. To tackle reference 
errors, we described how to extend the succeed/f ail paradigm 
followed by previous natural language researchers. 

We represented real world objects hierarchically in a 
knowledge base using a representation language, KL-One, that 
follows in the tradition of semantic networks and frames. In 
such a representation framework, the reference identification 
task looks for a referent by comparing the representation of the 
speaker's input to elements in the knowledge base by using a 
matching procedure. Failure to find a referent in previ-us 
reference identification systems resulted in the unsuccessful 
termination of the reference task. We claim that people behave 
better than this and explicitly illustrated such cases in an 
expert-apprentice domain about toy water pumps. 

We developed a theory of relaxation for recovering from 
reference failures that provides a much better model for human 
performance. When people are asked to identify objects, they 
behave in a particular way: find candidates, adjust as 
necessary, re-try, and, if necessary, give up and ask for help. 

S3 



Rule-Based Relaxation - 63 

We claim that relaxation is an integral part of this process and 
that the particular parameters of relaxation differ from task to 
task and person to person. Our work models the relaxation 
process and provides a. computational model for experimenting with 
the different parameters. The theory incorporates the same 
language and physical knowledge that people use in performing 
reference identification to guide the relaxation process. This 
knowledge is represented as a set of rulec and as data in a 
hierarchical knowledge base. Rule-based relaxation provided a 
methodical way to use knowledge about language and the world to 
find a referent. The hierarchical representation made it 
possible to tackle issues of imprecision and oVer-specif ication 
in a speaker's description. It allows one to check the position 
of a description in the hierarchy and to use that position to 
judge imprecision and over- specif icatrion and to suggest possible 
repairs to the description. 

Interestingly, one would expect \\\at "closest" match would 
suffice to solve the problem of finding a referent. We showed, 
however, that it doesn't usually provic: ou t/iti: the r v iC t 
referent. Closest match isn't su^ficienc ---use there are many 
features associated with an object and, t: ut, , >rern:^ning which 
of those features to keep and which to drop i • ^ difficult 
problem due to the combinatorics and the effects of context. The 
relaxation method described circumvents the problem by using the 
knowledge that people have about language and the physical world 
to prune down the search space. 
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This paper mentioned only a small aspect of what needs to be 
done with miscomrounication. There are much broader problems that 
we also want to address. We alluded in the paper to problems due 
to metonomy--the use of the name of one thing for that of 
another- -but never really tried in this work to handle more than 
a few special cases of it. There are also miscommunication 
problems th^t are outside of the reference area. We need to 
consider full utterances and the associated discourse in which 
they appear. Utterances can be imprecise or ill-formed with 
respect to the current discourse. The goals specified by a 
speaker through a particular utterance or discourse could be 
confused; For example, a speaker's requested goal could be 
outside the scope of the domain being discussed. We believe that 
our model will help solve the problem for this bigger picture. 
In particular, we feel the negotiation method will be important 
here, too. The negotiation process will become part of the plan 
recognition section of a natural language system. There a search 
of the plan space for the set of plans that might fit the 
utterance or sequence of utf^r yxees would be performed. A 
relaxation cor, orient relaced in style to the one outlined in this 
paper could be i?r/bfce<; t.c provide an orderly relaxation of the 
speaker's utterarr?s to i i ~ the piaii.-j and the dort&in world. This 
process will require more i^srectio*. with the speaker chrough 
the use of clarivl*v -lor ...lc/»ues . 
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Footnotes 

This research was supported in part by the Defense Advanced 

Research Project Agency under contract N0Q014-77-C-0378 . 

2 - 

An analysis of clarification subdialogues can be found in 

(Litman & Allen, 1984). 
3 

Of course, there are some situations --such as teaching- - 
where the hearer would be more willing to tolerate overspecif ic 
descriptions . 

4 "Chamber" was interpreted here in a 1 -oader sense by the 
listener because it was used right at the beginning of the 
dialogue before the speaker introduced other terms such as 
"tube" that would have better helped to distinguish the pieces. 
The example demonstrates how discourse affects reference. 

5 Ttie whole word here is "plastic." In these protocols, 
people often guess before hearing the whole utterance or even 
whole words . 

6 Grosz (1977, 1981) would describe this as a difference in 
"task plans" while Rcichman (1978, 1981) would say that the 
"communicative goals" differed. 

7 0f course, once one particular candidate is selected, then 
deciding which features to rel^x is relatively trivial- -one 
simply compares features of the candidate description (the 
target) to the speaker's description (the pattern) and notes any 
discrepancies . 
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g 

The latter case is there primarily for the times when one 
can't easily define a similarity metric for a feature. McCoy, 
(1985) and Tversky (1977) provide additional discussions about 

similarity metrics. 

9 . _.. _ __ 

We are stretching the definition of KL-One here with the 

dotted subsumption arrow. The point we want to make is that the 
AIRCHAMBER is similar to DescrA because their descriptions are 
almost exactly the same. 

10 Since DescrB refers to MAINTUBE, MAINTUBE could be dropped 
as a potential referent candidate for DescrC. We will, however, 
leave it as a potential candidate to make this example more 
complex. 

The part 2., matcher scores are numer ical scores computed 
from a set of role scores that indicate ho * well each feature of 
the two descriptions match. Those feature scores are represented 
on a scale: {+} , {> or <} , {->, {?},{-}. + is the highest 
and - :s the lowest score. > and < have the same score but the 
algorithm ca distinguish between them. 
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